SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates
نویسندگان
چکیده
MOTIVATION Single nucleotide polymorphism (SNP) detection exploiting redundancy in expressed sequence tag (EST) collections that arises from the presence of transcripts of the same gene from different individuals has been used to generate large collections of SNPs for many species. A second source of redundancy, namely that EST collections can contain multiple transcripts of the same gene from the same individual, can be exploited to distinguish true SNPs from sequencing error. In this article, we demonstrate with Atlantic salmon and pig EST collections that splitting the EST collection in two, detecting SNPs in both subsets, then accepting only cross-validated SNPs increases validation rates. RESULTS In the pig data set, 676 cross-validated putative SNPs were detected in a collection of 160,689 ESTs. When validating a subset of these by genotyping on MassARRAY 85.1% of SNPs were polymorphic in successful assays. In the salmon data set, 856 cross-validated putative SNPs were detected in a collection of 243,674 ESTs. Validation by genotyping showed that 81.0% of the cross-validated putative SNPs were polymorphic in successful assays. AVAILABILITY Cross-validated SNPs are available at dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/), ss69371838-ss69372575 for the salmon SNPs and ss69372587-ss69373226 for the pig SNPs.
منابع مشابه
Designing and Validation of One-Step T-ARMS-PCR for Genotyping the eNOS rs1799983 SNP
Background: The transversion of G to T (G894T) in human endothelial nitric oxide synthase (eNOS) gene has profound effects such as male infertility, recurrent miscarriage, multiple sclerosis and cardiovascular diseases.Objectives: Development of a new Multiplex Tetra-Primer Amplifi cation Refractory Mutation System - Polymerase Chain Reaction (T-ARMS-PCR) for detection of...
متن کاملAutomated Clustering and Assembly of Large EST Collections
The availability of large EST (Expressed Sequence Tag) databases has led to a revolution in the way new genes are cloned. Difficulties arise, however, due to high error rates and redundancy of raw EST data. For these reasons, one of the first tasks performed by a scientist investigating any EST of interest is to gather contiguous ESTs and assemble them into a larger virtual cDNA. The REX (Recur...
متن کاملUnlocking Diversity in Germplasm Collections via Genomic Selection: A Case Study Based on Quantitative Adult Plant Resistance to Stripe Rust in Spring Wheat.
Harnessing diversity from germplasm collections is more feasible today because of the development of lower-cost and higher-throughput genotyping methods. However, the cost of phenotyping is still generally high, so efficient methods of sampling and exploiting useful diversity are needed. Genomic selection (GS) has the potential to enhance the use of desirable genetic variation in germplasm coll...
متن کاملMining SNPs from EST sequences using filters and ensemble classifiers.
Abundant single nucleotide polymorphisms (SNPs) provide the most complete information for genome-wide association studies. However, due to the bottleneck of manual discovery of putative SNPs and the inaccessibility of the original sequencing reads, it is essential to develop a more efficient and accurate computational method for automated SNP detection. We propose a novel computational method t...
متن کاملHigh-throughput identification, database storage and analysis of SNPs in EST sequences.
Single nucleotide polymorphisms (SNPs) are the most frequent form of DNA variation and disease-causing mutations in many genes. Due to their abundance and slow mutation rate within generations, they are thought to be the next generation of genetic markers that can be used in a myriad of important biological, genetic, pharmacological, and medical applications. There are several strategies both e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 23 13 شماره
صفحات -
تاریخ انتشار 2007